智能论文笔记

Label-Imbalanced and Group-Sensitive Classification under Overparameterization

Ganesh Ramachandra Kini , Orestis Paraskevas , Samet Oymak , Christos Thrampoulidis

分类：机器学习 | (统计)机器学习

2021-03-02

标签 - 不平衡和组敏感分类中的目标是优化相关的指标，例如平衡错误和相同的机会。经典方法，例如加权交叉熵，在训练深网络到训练（TPT）的终端阶段时，这是超越零训练误差的训练。这种观察发生了最近在促进少数群体更大边值的直观机制之后开发启发式替代品的动力。与之前的启发式相比，我们遵循原则性分析，说明不同的损失调整如何影响边距。首先，我们证明，对于在TPT中训练的所有线性分类器，有必要引入乘法，而不是添加性的Logit调整，以便对杂项边缘进行适当的变化。为了表明这一点，我们发现将乘法CE修改的连接到成本敏感的支持向量机。也许是违反，我们还发现，在培训开始时，相同的乘法权重实际上可以损害少数群体。因此，虽然在TPT中，添加剂调整无效，但我们表明它们可以通过对乘法重量的初始负效应进行抗衡来加速会聚。通过这些发现的动机，我们制定了矢量缩放（VS）丢失，即捕获现有技术作为特殊情况。此外，我们引入了对群体敏感分类的VS损失的自然延伸，从而以统一的方式处理两种常见类型的不平衡（标签/组）。重要的是，我们对最先进的数据集的实验与我们的理论见解完全一致，并确认了我们算法的卓越性能。最后，对于不平衡的高斯 - 混合数据，我们执行泛化分析，揭示平衡/标准错误和相同机会之间的权衡。

translated by 谷歌翻译

在大多数数据科学方法中，最大熵的原理（Maxent）用于后验证明某些参数模型的合理性，这些模型已根据经验，先验知识或计算简单性选择。在传统模型构建的垂直公式中，我们从现象学约束的线性系统开始，渐近地在满足提供的约束集集的所有可行分布上得出了分布。最大分布起着特殊的作用，因为它是所有现象学上可行的分布中最典型的，代表了大N技术的良好膨胀点。这使我们能够以完全DATA驱动的方式始终如一地制定假设检验。数据支持的适当参数模型可以在模型选择结束时始终推导。在Maxent框架中，我们恢复了多个应用程序中使用的主要分数和选择程序，并评估其在数据生成过程中捕获关联并确定最概括的模型的能力。标准模型选择的数据驱动的对应物展示了最大原则提倡的演绎逻辑的统一前景，同时有可能为反问题提供新的见解。

translated by 谷歌翻译

Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding algorithm: leveraging availability of pairs of semantically "similar" data points and "negative samples," the learner forces the inner product of representations of similar pairs with each other to be higher on average than with negative samples. The current paper uses the term contrastive learning for such algorithms and presents a theoretical framework for analyzing them by introducing latent classes and hypothesizing that semantically similar points are sampled from the same latent class. This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes. Our generalization bound also shows that learned representations can reduce (labeled) sample complexity on downstream tasks. We conduct controlled experiments in both the text and image domains to support the theory.

translated by 谷歌翻译